Digital Libraries and Document Image Analysis

نویسنده

  • Henry S. Baird
چکیده

The rapid growth of digital libraries (DLs) worldwide poses many new challenges for document image analysis (DIA) research and development. DLs promise to offer more people access to larger document collections, and at far greater speed, than physical libraries can. But DLs also tend, for many reasons, to serve poorly, or even to omit entirely, many types of non-digital human–legible media, such as originally printed and handwritten documents. These media, in their original physical (undigitized) form, are readily — if not always quickly — legible, searchable, and browseable, whereas in the form of document images accessed through DLs they often lose many of their original advantages while of course lacking many advantages of symbolically encoded information. The author explores these issues and illustrates them with brief case studies arising from his experience as a DIA researcher in collaboration with several DL projects in the US. Difficult open DIA technical problems in DL applications are identified in the contrasting advantages of paper and digital displays, at every stage of capture, early processing, recognition, analysis, presentation, & retrieval, and in personal and interactive applications. These support the conclusion that the international DIA R&D community is urgently needed (because uniquely qualified) to provide new technology to help rescue from neglect — even, in many cases, eventual oblivion — the world’s vast culturally irreplaceable legacy paper document collections.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Digital Libraries and Document Image Analysis Techniques: a Survey

Nowadays, Digital Libraries have become a widely used service to store and share both digital born documents and digital versions of works stored by traditional libraries. Document images are intrinsically non-structured and the structure and semantic of the digitized documents is in most part lost during the conversion. Several techniques related to the Document Image Analysis research area ha...

متن کامل

Digital Libraries and Document Image Retrieval Techniques: A Survey

Nowadays, Digital Libraries have become a widely used service to store and share both digital born documents and digital versions of works stored by traditional libraries. Document images are intrinsically non-structured and the structure and semantic of the digitized documents is in most part lost during the conversion. Several techniques related to the Document Image Analysis research area ha...

متن کامل

Digitizing a Million Books: Challenges for Document Analysis

This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories.The challengesare identified fromthe experienceof theon-going activities toward digitizing and archiving onemillion books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient imageprocessing algorithms....

متن کامل

Tools for Document Image Retrieval in Digital Libraries: the AIDI System

In the last few years, Digital Libraries became one important application area for Document Image Analysis and Recognition research [1]. In this field, a relevant line of research is Document Image Retrieval (DIR) that aims at finding relevant documents relying on image features only. DIR techniques are used to index not only the textual content of a document, but also its layout, graphical obj...

متن کامل

Document Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)

Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...

متن کامل

The Role of Digital Reality Technologies in Libraries: A Systemic Review

Introduction: Fourth-generation libraries can no longer be satisfied with web software facilities and must use technologies, including digital realities to increase the level of service and attract clients. Therefore, this review aimed to identify the roles and effects of these technologies in libraries. Methods: In this systematic review, PubMed, Web of Science, Scopus, and Google Scholar adv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003